maximum likelihood method
Performance-guaranteed regularization in maximum likelihood method: Gauge symmetry in Kullback -- Leibler divergence
The maximum likelihood method is the best-known method for estimating the probabilities behind the data. However, the conventional method obtains the probability model closest to the empirical distribution, resulting in overfitting. Then regularization methods prevent the model from being excessively close to the wrong probability, but little is known systematically about their performance. The idea of regularization is similar to error-correcting codes, which obtain optimal decoding by mixing suboptimal solutions with an incorrectly received code. The optimal decoding in error-correcting codes is achieved based on gauge symmetry. We propose a theoretically guaranteed regularization in the maximum likelihood method by focusing on a gauge symmetry in Kullback -- Leibler divergence. In our approach, we obtain the optimal model without the need to search for hyperparameters frequently appearing in regularization.
- Asia > Japan > Kyūshū & Okinawa > Kyūshū > Fukuoka Prefecture > Fukuoka (0.04)
- Europe > Switzerland > Basel-City > Basel (0.04)
- North America > United States (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
A gentle Introduction to Bayesian Inference
In this article, we have seen the Bayesian approach in action with the help of a small example. It uses prior knowledge and updates it with observed data to create a posterior, exactly like humans intuitively do. This approach is better than discarding the data and just proceeding with some prior, obviously. It is even more powerful than the maximum likelihood method: you can see this by choosing a flat prior, i.e. the prior gives the same probability (or density) to every possible value θ and is essentially a constant. Furthermore, the Bayes method even gives you a distribution of the parameters, while the maximum likelihood method does not.
Efficient Computation of the Quasi Likelihood function for Discretely Observed Diffusion Processes
Höök, Lars Josef, Lindström, Erik
We introduce a simple method for nearly simultaneous computation of all moments needed for quasi maximum likelihood estimation of parameters in discretely observed stochastic differential equations commonly seen in finance. The method proposed in this papers is not restricted to any particular dynamics of the differential equation and is virtually insensitive to the sampling interval. The key contribution of the paper is that computational complexity is sublinear in the number of observations as we compute all moments through a single operation. Furthermore, that operation can be done offline. The simulations show that the method is unbiased for all practical purposes for any sampling design, including random sampling, and that the computational cost is comparable (actually faster for moderate and large data sets) to the simple, often severely biased, Euler-Maruyama approximation.
- Europe > Sweden > Uppsala County > Uppsala (0.04)
- Europe > Sweden > Skåne County > Lund (0.04)
Asymptotic Accuracy of Distribution-Based Estimation for Latent Variables
Hierarchical statistical models are widely employed in information science and data engineering. The models consist of two types of variables: observable variables that represent the given data and latent variables for the unobservable labels. An asymptotic analysis of the models plays an important role in evaluating the learning process; the result of the analysis is applied not only to theoretical but also to practical situations, such as optimal model selection and active learning. There are many studies of generalization errors, which measure the prediction accuracy of the observable variables. However, the accuracy of estimating the latent variables has not yet been elucidated. For a quantitative evaluation of this, the present paper formulates distribution-based functions for the errors in the estimation of the latent variables. The asymptotic behavior is analyzed for both the maximum likelihood and the Bayes methods.
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.54)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.54)
Hierarchical Mixtures-of-Experts for Exponential Family Regression Models with Generalized Linear Mean Functions: A Survey of Approximation and Consistency Results
Jiang, Wenxin, Tanner, Martin A.
We investigate a class of hierarchical mixtures-of-experts (HME) models where exponential family regression models with generalized linear mean functions of the form psi(ga+fx^Tfgb) are mixed. Here psi(...) is the inverse link function. Suppose the true response y follows an exponential family regression model with mean function belonging to a class of smooth functions of the form psi(h(fx)) where h(...)in W_2^infty (a Sobolev class over [0,1]^{s}). It is shown that the HME probability density functions can approximate the true density, at a rate of O(m^{-2/s}) in L_p norm, and at a rate of O(m^{-4/s}) in Kullback-Leibler divergence. These rates can be achieved within the family of HME structures with no more than s-layers, where s is the dimension of the predictor fx. It is also shown that likelihood-based inference based on HME is consistent in recovering the truth, in the sense that as the sample size n and the number of experts m both increase, the mean square error of the predicted mean response goes to zero. Conditions for such results to hold are stated and discussed.
- North America > Canada > Ontario > Toronto (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- Asia > Middle East > Jordan (0.06)
- (7 more...)
Selection of tuning parameters in bridge regression models via Bayesian information criterion
We consider the bridge linear regression modeling, which can produce a sparse or non-sparse model. A crucial point in the model building process is the selection of adjusted parameters including a regularization parameter and a tuning parameter in bridge regression models. The choice of the adjusted parameters can be viewed as a model selection and evaluation problem. We propose a model selection criterion for evaluating bridge regression models in terms of Bayesian approach. This selection criterion enables us to select the adjusted parameters objectively. We investigate the effectiveness of our proposed modeling strategy through some numerical examples.
ART2/BP architecture for adaptive estimation of dynamic processes
The goal has been to construct a supervised artificial neural network that learns incrementally an unknown mapping. As a result a network consisting of a combination of ART2 and backpropagation is proposed and is called an "ART2/BP" network. The ART2 network is used to build and focus a supervised backpropagation network. The ART2/BP network has the advantage of being able to dynamically expand itself in response to input patterns containing new information. Simulation results show that the ART2/BP network outperforms a classical maximum likelihood method for the estimation of a discrete dynamic and nonlinear transfer function.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
- Europe > Norway > Eastern Norway > Oslo (0.04)
ART2/BP architecture for adaptive estimation of dynamic processes
The goal has been to construct a supervised artificial neural network that learns incrementally an unknown mapping. As a result a network consisting of a combination of ART2 and backpropagation is proposed and is called an "ART2/BP" network. The ART2 network is used to build and focus a supervised backpropagation network. The ART2/BP network has the advantage of being able to dynamically expand itself in response to input patterns containing new information. Simulation results show that the ART2/BP network outperforms a classical maximum likelihood method for the estimation of a discrete dynamic and nonlinear transfer function.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
- Europe > Norway > Eastern Norway > Oslo (0.04)
ART2/BP architecture for adaptive estimation of dynamic processes
The goal has been to construct a supervised artificial neural network that learns incrementally an unknown mapping. As a result a network consisting ofa combination of ART2 and backpropagation is proposed and is called an "ART2/BP" network. The ART2 network is used to build and focus a supervised backpropagation network. The ART2/BP network has the advantage of being able to dynamically expand itself in response to input patterns containing new information. Simulation results show that the ART2/BP network outperforms a classical maximum likelihood method for the estimation of a discrete dynamic and nonlinear transfer function.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
- Europe > Norway > Eastern Norway > Oslo (0.04)